Skip to content

fix: List::MoreUtils PP-mode test failures (RC1-RC5)#515

Merged
fglock merged 6 commits intomasterfrom
fix/list-moreutils
Apr 20, 2026
Merged

fix: List::MoreUtils PP-mode test failures (RC1-RC5)#515
fglock merged 6 commits intomasterfrom
fix/list-moreutils

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented Apr 20, 2026

Summary

Fixes 6 of 7 failing subtests in ./jcpan -t List::MoreUtils (v0.430). The full plan and per-RC rationale live in dev/modules/list_moreutils.md.

master this PR
Total subtests 4492 4533
Failing subtests 7 (across 8 files) 1 (in 1 file)
Green test files 53 / 61 60 / 61

The one remaining failure is indexes.t test 18, a Scalar::Util::weaken-on-temporary test (RC6 in the design doc). That sits on top of PerlOnJava's cooperative-refcount overlay and is being addressed on a separate weaken branch; this PR does not touch it.

Commits

Commit RC Fixes
db94a5ae1 RC1 strict-refs violation for numeric scalar deref — binsert.t, bremove.t, mesh.t, zip6.t
a161fa284 RC3 POSIX::setlocale / localeconv / LC_* stubs — minmaxstr.t
3bfaffda3 RC2 hoist my in EXPR for LIST; / EXPR while COND; statement-modifier bodies to outer scope — unblocks part.t
c9b8e05dd RC4 split emits empty field between zero-width and consuming match at the same offset — mode.t
96c4f92d5 RC5 warn Use of uninitialized value in array|hash element on undef subscript — part.t leak-free tests
f67c5860c docs record progress in dev/modules/list_moreutils.md

What changed

RC1 — strict-refs for numeric deref (RuntimeScalar.java, RuntimeScalarReadOnly.java)

arrayDeref() / hashDeref() used to silently return empty or throw Not a HASH reference for INTEGER / DOUBLE. They now throw the perl-compatible Can't use string ("N") as an ARRAY\|HASH ref while "strict refs" in use. RuntimeScalarReadOnly picks up the same rule for loop aliases to literals (for my $x (1) { @$x }), but new arrayDerefGet / hashDerefGet overrides keep 1->[0] / 1->{a} silent to match perl's literal-arrow compile-time optimization.

RC3 — POSIX stubs (src/main/perl/lib/POSIX.pm)

POSIX.pm already exported setlocale, localeconv, and the LC_* constants — none of them were actually defined. Added Perl stubs: setlocale returns its locale argument, localeconv returns a default "C"-locale table, LC_* are distinct small integers.

RC2 — my hoisting in statement-modifier loops (StatementResolver.java)

Perl treats my @x = EXPR for LIST; as declaring @x in the enclosing scope (so the rest of the block can refer to it) while each iteration creates a fresh instance. The parser now detects this pattern for for/foreach and while/until modifiers, emits a bare my DECL; before the loop, and wraps the while/until body in a BlockNode so the inner my shadows the outer on each iteration. Outer-scope value matches perl: empty / undef.

RC4 — split with zero-width ∥ consuming alternation (Operator.java)

Java's Matcher always tries alternations left-to-right, so in (?:\b|\s) the \b branch always wins and \s is never attempted at the same offset. Perl's split, in contrast, re-runs the regex with REG_NOTEMPTY_ATSTART after each zero-width match — a consuming alternative at the same position becomes an additional separator with an empty field between the two. After each zero-width match we now probe via Matcher.matches() on progressively larger regions starting at matchEnd; the shortest region the pattern matches gives the length of the consuming alternative.

RC5 — undef-as-subscript warning (RuntimeArray.java, RuntimeHash.java)

RuntimeArray.get / RuntimeHash.get now emit Use of uninitialized value in array\|hash element (category uninitialized) when called with an UNDEF index. Both the read and lvalue/autoviv paths go through get().

Test plan

  • make (build + unit tests) green after every commit
  • ./jcpan -t List::MoreUtils — 60/61 files green, 1/61 deferred to weaken branch (was 53/61 on master)
  • Original 7 failing subtests: 6 now pass, 1 deferred (RC6)
  • Hand-verified perl-parity for representative regression cases:
    • use strict; my $x=1; @$x now dies (was silent)
    • print 1->[0] still silent (unchanged)
    • my @x = (1,2) for 1..3; print scalar @x — now prints 0, matching perl
    • split /(?:\b|\s)/, "Lorem ipsum," — matches perl

Generated with Devin

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

fglock and others added 6 commits April 20, 2026 16:51
Under `use strict 'refs'`, dereferencing a scalar whose value is a plain
integer or double as an ARRAY/HASH ref now dies with the perl-compatible
message `Can't use string ("N") as an ARRAY ref while "strict refs" in use`
(previously `RuntimeScalar.arrayDeref()` silently returned an empty array
and `hashDeref()` threw "Not a HASH reference").

The same rule applies to read-only scalars that happen to hold a number
(e.g. a `foreach` loop variable aliased to a caller's literal argument),
matching perl: `for my $x (1) { @$x }` dies. Compile-time literal arrow
dereferences (`print 1->[0]`, `print 1->{a}`) stay silent via new
`arrayDerefGet` / `hashDerefGet` overrides on `RuntimeScalarReadOnly`,
also matching perl.

This unblocks four List::MoreUtils PP-mode tests whose `is_dying` checks
rely on this diagnostic: binsert.t, bremove.t, mesh.t, zip6.t.

Part of dev/modules/list_moreutils.md (RC1).

Generated with [Devin](https://devin.ai)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
POSIX.pm already exported `setlocale`, `localeconv`, and the LC_* category
constants, but none of them had an implementation: any caller got
`Undefined subroutine &POSIX::setlocale`.

PerlOnJava cannot really switch the JVM/C locale, but adding minimal stubs
is enough for modules that just call `setlocale(LC_COLLATE, "C")` for its
return value or probe `localeconv()` for basic numeric formatting. The
stubs return the requested locale name (`C` by default) and a reasonable
default locale table.

Unblocks List::MoreUtils PP-mode minmaxstr.t (RC3 in
dev/modules/list_moreutils.md).

Generated with [Devin](https://devin.ai)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Perl treats the `my` declaration in
`my @long_list = EXPR for LIST;` / `my $x = EXPR while COND;` as
declared in the enclosing scope — the variable name is visible for the
rest of the block — while each loop iteration still creates a fresh
instance. Without this, the enclosing scope sees an undeclared variable
and `use strict` bails at compile time, e.g.:

    my @long_list = int rand(1000) for 0 .. 1E7;
    my @part      = part { ... } @long_list;   # strict error pre-fix

The parser now detects this pattern in the `for`/`foreach` and
`while`/`until` statement-modifier branches, emits a bare `my DECL;` in
the enclosing scope, and leaves the inner `my DECL = RHS` in the loop
body (wrapped in a BlockNode for `while`/`until` so the inner `my`
properly shadows the outer one). The end-of-loop value of the outer
variable matches perl: empty/undef, because the body's `my` introduces
a fresh per-iteration instance.

Unblocks List::MoreUtils PP-mode part.t (RC2 in
dev/modules/list_moreutils.md).

Generated with [Devin](https://devin.ai)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Perl's split re-runs its regex at the end of each zero-width match with
REG_NOTEMPTY_ATSTART. When a consuming alternative of the pattern also
matches at that position, the consumed characters become an additional
separator and an empty field appears between the two separators, e.g.

    split /(?:\b|\s)/, "Lorem ipsum,"
    # ("Lorem", "", "ipsum", ",")

Java's Matcher tries alternations left-to-right and stops at the first
match, so `\b` always wins and the `\s` alternative is never attempted
at the same offset. Without compensation, the space leaks into the next
field and the empty separator disappears: jperl was returning
`("Lorem", " ", "ipsum", ",")`.

After each zero-width match, we now probe with `Matcher.matches()` on
regions of increasing length starting at matchEnd. The shortest region
the full pattern matches gives the length of the consuming alternative;
if found, we emit an empty field and advance past the consumed
characters.

Unblocks List::MoreUtils PP-mode mode.t (RC4 in
dev/modules/list_moreutils.md).

Generated with [Devin](https://devin.ai)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Perl emits
    Use of uninitialized value in array element
    Use of uninitialized value in hash element
whenever an undef is used as a subscript under `use warnings`. PerlOnJava
wasn't producing this diagnostic, which made it impossible to test
patterns like `$parts[$code->($_)]` where `$code` returns undef.

The warning is emitted from the RuntimeArray.get / RuntimeHash.get
entry points, covering both the rvalue read path and the lvalue / autoviv
path (both go through get()).

Unblocks List::MoreUtils PP-mode part.t warnings checks (RC5 in
dev/modules/list_moreutils.md).

Generated with [Devin](https://devin.ai)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@fglock fglock merged commit d19b693 into master Apr 20, 2026
2 checks passed
@fglock fglock deleted the fix/list-moreutils branch April 20, 2026 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant